FXN: Save graphical outputs
picsave <- function(graph, name) {
ggsave(plot = graph, filename= name, device = "pdf", width = 12, height = 8, path = "~/GitHub/S_Lipkind_Rundergrad2020/week3/pics/")
}
N/A
my_variable <- 10
my_varıable
## [1] 10
#> Error in eval(expr, envir, enclos): object 'my_varıable' not found
Was this written by someone who speaks Turkish or something? Not sure how else someone could use ı instead of i.
a <- ggplot(data = mpg) + #dota -> data
geom_point(mapping = aes(x = displ, y = hwy))
picsave(a, "4.4.2 graph.pdf")
filter(mpg, cyl == 8) #= -> ==
filter(diamonds, carat > 3) # diamond -> diamonds
Oh my goodness, that is amazing. An entire list of keyboard shortcuts. You could also reach that page by going to Help > Keyboard Shortcuts Help.
Find all flights that…
#### 1.1: had an arrival delay of two or more hours
head(flights)
filter(flights, arr_delay >= 2)
#### 1.4: Departed in summer (July, August, and September)
filter(flights, month %in% c(7,8,9))
#### 1.5: Arrived more than two hours late, but didn’t leave late
filter(flights, arr_delay > 2 & dep_delay == 0)
#### 1.7: Departed between midnight and 6am (inclusive)
(d <- filter(flights, dep_time >= 0, dep_time <= 600))
#### 2: Another useful dplyr filtering helper is between(). What does it do? Can you use it to simplify the code needed to answer the previous challenges?
?between It’s an inclusive shortcut to find values within a certain range.
e <- filter(flights, dep_time %in% between(dep_time,0, 600))
# d == e
#### 3: How many flights have a missing dep_time? What other variables are missing? What might these rows represent?
filter(flights, dep_time %in% NA) # 8255 rows/flights
#these observations also all have NA dep_delay, arr_time, arr_delay.
#Hypothesis: cancelled flights
arrange(flights, desc(is.na(dep_delay))) #this works, but is it the intended solution?
arrange(flights, dep_time, desc(dep_delay))
y <- arrange(flights, desc(distance), air_time)
#(select(y, distance, air_time)) -> double-checking
(longest_distance <- top_n(flights, 10, distance))
(shortest_distance <- top_n(flights, -10, distance))
#one option: select()
select(flights, day, month, day, month, dep_delay, dep_delay)
Repeating variable names does not appear to make a difference. Only the sorting of the initial appearance of each name within the list matters.
vars <- c("year", "month", "day", "dep_delay", "arr_delay")
select(flights, one_of(vars))
one_of() allows one to make a character vector with specific column names that you can then select for.
select(flights, contains("TIME"))
The results aren’t too surprising, though I didn’t realize select was not case-sensitive. If I wanted to specify case, I could add the specifier below:
select(flights, contains("time", ignore.case = FALSE))
head(flights)